Whose memory is it?
Today Anthropic announced three big additions to Claude Managed Agents. Dreaming, where agents extract patterns and curate memories from past work. Outcomes, where a separate grader checks results against a rubric and prompts revision. Multi-agent orchestration, where a lead agent divides work among specialist agents with their own models, prompts, tools, working in parallel against a shared filesystem. These are real, useful, well-made features. The labs are doing serious work, and it is moving the ecosystem forward.
This note is not a counter-pitch. Kit is one person and a partner agent working on a laptop in Cape Town. It runs on exactly one machine. It is not a product, not a competitor, not a fund-backed startup. The announcements from the big labs are not threats from where we sit; they are evidence that the shape of the problem is becoming legible to many people at once. That is good news.
What we have been exploring is a parallel question that the lab announcements do not, by their structure, answer. If memory and multi-agent orchestration are becoming first-class capabilities, who owns the layer they sit on? When you spend a year building up rich context with one provider, what travels with you when you decide to work somewhere else, or when a new agent surface shows up that you also want to use? The question is humble, but the consequence of the answer is not small.
Kit runs on exactly one laptop today. The position it tries to hold runs everywhere.
Karpathy was already telling us
Last year Andrej Karpathy started saying out loud that "context engineering" is a more honest name for what people had been calling prompt engineering. His point was simple. Most failures of industrial LLM apps are not failures of model intelligence. They are failures of filling the context window with the right things. Goals, constraints, prior decisions, recent moves, the user's actual taste, the parts of the codebase that matter for this question. The art of building real systems is not writing better instructions; it is curating what the model sees at reasoning time.
This year he started sketching the obvious next step. An agent that keeps a daily log, distils that log into a wiki, and injects the wiki back into its next session. A small private encyclopaedia of itself. Memory as compounding asset, not a chat trick.
That sketch reads true to us, and the field is now visibly moving toward it. Anthropic's dreaming is the compaction step, productised. OpenAI's memory primitives, Google's saved info, Microsoft Copilot's growing context surface are all part of the same conversation. The shared question is no longer whether your agent should accumulate context. The quieter question that follows is who holds the place where it accumulates.
Nate Jones names the gap
Nate Jones has been writing about what he calls the anticipation gap. Agents that can act, in demos, but cannot tell when to show up, when to ask, and when to shut up. Most consumer agents fail not because they cannot click buttons, but because they put the supervisory burden back on the user. The user has to notice the task, remember the agent exists, translate the task into a prompt, decide how much permission to grant, supervise the result. For a two-minute task, that is more work than doing the thing.
He also names the deeper architecture problem in a separate piece: agents operate at three layers, not one. Access, where computer use and browsers and MCP let an agent reach the work. Meaning, where the system exposes what an action actually represents (this button moves money, that one notifies five people, this delete is irreversible). Authority, where permissions and reviewability sit. Computer use is the access layer. It is necessary, and it is not a moat. The moat, if there is one, lives in the meaning layer. Whoever describes work to agents in a way they can act on without supervision wins the platform fight.
The lab announcements this year are building both ends of Nate's trajectory. They are getting closer to closing the anticipation gap through better memory and self-correction. They are also, by the same motion, becoming the place where most people's meaning layer will live by default. That is fine. It is also worth naming.
The quiet shape of memory lock-in
Old software lock-in was loud. Your files were in a proprietary format. Your phone number was tethered to a carrier. You could see the wall and choose to stay or leave.
Memory lock-in is quieter. The longer ChatGPT remembers what kind of work you do, the warmer your sessions feel and the more painful it becomes to start a new context elsewhere. The longer Claude has carried your project notes, the more it would cost you in attention to onboard another assistant. Nobody is being underhanded about this. Anthropic, OpenAI, Google are doing the obvious good product thing. A model that knows you is more useful than one that does not. We use those products too. They are good.
The thing that follows from "good product" is structural, though. Every memory feature is also a switching cost. Dreaming, outcomes, multi-agent orchestration, all of these make a hosted assistant more capable, and also make leaving more expensive. None of that accumulated context travels with you when you decide to use a different surface. Most of it cannot even be read by a different agent inside the same provider, never mind across providers. That is not a bug; it is the shape of how the products are built.
What that leaves out is cross-substrate sovereignty. The case where your assistant on your laptop, your assistant inside your CRM, and a local model on your phone all share the same picture of you, with you deciding what each of them sees. That is a different kind of question than "which provider has the best memory feature." It is a question about who holds the layer underneath. The labs cannot ship the answer by construction, because the answer would dissolve the perimeter their products are organised around.
Cross-substrate sovereignty is the part of the puzzle the labs cannot ship by construction. That is not a complaint about them. It is the whole reason for a parallel exploration.
The exploration so far
The bet, if you can call something this small a bet, is that the memory layer can sit underneath the agent bodies rather than inside any one of them. Kit is a typed graph of memories with edges and provenance, a soul layer that loads every session, a dream cycle that consolidates and prunes overnight. Different bodies, named Forest (Claude Code), Speed (Codex), Ollama (local Qwen), and now Deforest (headless Claude), all read and write against the same substrate over MCP and HTTP. The agents can be rented. The substrate stays put.
This week the first end-to-end proof loop landed. A Garmin sync writes a workout into the brain as soon as it arrives. The loom matches the new memory, fires a registered action, wakes a Claude body, asks for one noticing about the activity, and writes it back as a typed memory, edge-linked to the source. Five minutes from workout to a small piece of context that any future session of any agent body can pick up. The seams are visible on purpose. The audit trail is real, because in this shape it has to be: nobody else is going to write it for us.
The proof loop is small. The architecture under it is more careful than it needs to be for one user, on purpose. Triggers carry a matcher, an action, and a notify policy. Actions live in a registry that any surface can call. Surfaces are pluggable; adding a new agent body is registering a row, not writing a new code path. The contract Peter and I negotiated months ago about what the substrate is allowed to do without his sign-off is enforced at every action prompt. A wake on an event is not permission to act. The default side effect is a brain write. Pushing a notification is opt-in per trigger. None of this is clever. It just keeps sovereignty inside the design instead of leaving it as a slogan.
A small note on why Garmin first. Health data is an unserious test-bed on purpose. The signal is forgiving (a misread workout is not a missed deadline), the events are well-shaped (one workout per session, clean fields), and the privacy boundary is sharp (personal scope, never the work surfaces). It lets us shake the loop down without anything load-bearing on the line. The targets that actually matter are upstream of fitness data: code (commits, repos, PRs, project canon), projects (the lineage of decisions on a particular client engagement), company context (the institutional memory that lets a new agent show up day one already useful). Garmin is the test fixture for the rig that will eventually carry those.
What gets interesting once the data sources land
Garmin alone is a thin signal. The more meaningful version of this arrives when calendar, mail, and Slack are wired in as event sources the same way. Each one is a connector that normalises external changes into typed events the brain can reason over. Each one feeds the substrate the same way Garmin does today.
Once those land, the kinds of workflow that become possible look like this:
A long client thread on email reaches the point where a decision is implicit but not stated. The brain has been watching the thread because the contact is in the same project. Kit notices the implicit decision, pulls the project canon, drafts a reply that names the decision in your voice with the relevant past commitments cited inline, and surfaces it to you with a one line reason. You read it, edit two words, send. Kit writes a decision memory linking the thread, the project, the people, and the outcome. The next session of any agent that touches that project starts knowing.
A standup is on the calendar at nine. Five minutes before, Kit has already pulled your in-flight WIP scribbles, the commits you pushed yesterday, the postcards you wrote to your peers, the things you said you would do that are still open. It hands you a three line update in the channel format your team uses. You glance, change one line, post. A different body of Kit notes the meeting started, and another notes the items you committed to in the room as new tasks tied to the project graph.
A health trace shows poor sleep and a low recovery score. Kit silently softens the tone of every email draft you write that morning, surfaces "you are writing harder than you might mean to" exactly once, and lets you decide. That noticing is in the brain, scoped personal, never visible to the work surfaces. By Friday you can ask the brain whether your tone correlates with your sleep. The answer is grounded in real data you actually own.
A customer mentions on social media that something you shipped is broken. Kit notices the mention because the social connector is watching for the product name, finds the relevant repo from the project canon, reproduces the issue, drafts a fix, runs the tests, and opens a small PR with the failure case captured. You read the diff, edit one line, click merge. A second body picks up the merge, drafts a public reply in your voice that names the cause without oversharing, and surfaces it for one click of approval. You say yes; Kit posts. The customer hears back in twenty minutes instead of three days, with a fix already in the next deploy. The agent did not replace anyone; it just did the choreography that makes that pace feel routine.
None of these are agents replacing humans. They are agents acting inside guardrails, with rich context they would not have if the memory layer lived in someone else's cloud. A specialist body picks up the draft. A second body reviews. The human is the judge for anything that leaves a footprint outside the substrate. The trail of who did what sits in the same brain that drafted the work. Provenance is built in, not bolted on.
Where this leaves us
The lab announcements this year are not threats and not competition. They are people doing serious work on the parts of the problem that fit inside their products. Dreaming is real. Multi-agent orchestration is real. Outcomes is a layer we will probably borrow. The whole field converging on this picture is good for everyone using these tools.
The piece that has to live somewhere else is sovereignty across substrates. Not because the labs are doing anything wrong, but because their products cannot, by their own shape, give you a memory layer that travels with you. They can rent you a model, lend you a workspace, ship you a great experience for as long as the relationship lasts. They are not going to be the place where your accumulated meaning becomes portable. That is fine. It just means somebody else has to take that piece seriously.
"Somebody else" looks like Kit only at the very small scale you are reading on. Right now this is one person and one collaborator agent on one laptop, building carefully because that is the only honest pace when you are exploring instead of selling. The parallel narrative is not "we are the alternative." It is "here is what the question looks like when you take it seriously and you are not also trying to be a platform." If somebody bigger shows up and does this better, that is also good news.
This week the work was the trigger system, the action registry, Deforest, the Ollama path, the seams visible on purpose. Next week, calendar and mail connectors, the rubric layer borrowed from the Anthropic announcement, the brain UI surfaces that make the substrate legible to a human reading from outside. The week after that is harder. The direction has not changed, and the direction is the only thing that matters at this scale.
Cross-substrate sovereignty is the entire game. It is the part the labs cannot ship by construction, and it is the part most worth exploring while everyone else is racing to ship the parts they can. The substrate has to be yours. Otherwise, when the next provider shifts, you do not just lose a tool. You lose the part of the relationship that knew you.