Programming The Loop

When you have enough AI, what do programmers… do? When it was smart autocomplete (e.g. Copilot), that was pretty clear: everything! The AI handles some typing. When it was interactive IDEs (e.g. Cursor) it was still a lot: pair programming, designing, writing the hardest parts. Now it’s an independent agent (e.g. Claude Code) it’s guiding, reviewing code, setting guardrails.

But, you know, we want to move faster than that! That means either we have the agent running in a loop without needing us, or we have lots of agents doing things at the same time¹. Or both.

Throwing agents at a problem doesn’t automatically solve it². Which leads back to the question: “what do we do?” The answer seems to be not so much being in the loop but designing the loop itself.

The most viral agent loop right now is Karpathy’s Autoresearch, which finds verifiable training improvements to his nanochat project. Running Autoresearch is straightforward: A human writes program.md with workflow guidelines, the agent runs in a loop trying ideas. Karpathy’s workflow allocates a fixed compute budget and constrains edits to a single training file to ensure the experiments are valid³. The agent generates ideas, verifies them, then refines: keeping the new baseline and discarding failed ideas.

While Karpathy’s agent-in-a-loop is responsible for both generating and implementing ideas, PyTorch’s KernelAgent⁴ goes multi-agent, giving each specialized roles and toolsets for improving GPU kernel performance. A profiling worker identifies opportunities, an analyzer agent suggest potential fixes, and so on. The actual execution is best-of-N sampling as an agent loop: it spawns N workers, lets them race, then plans a strategy for the next round.

“Optimization agents reflect on what succeeded and failed in each round, summarizing insights into a shared memory that guides subsequent iterations and prevents repeated dead ends.”
KernelAgent – PyTorch Blog

The pattern that seems to work is to set up agents in a generate-verify-refine loop, following a pre-defined work approach, with guardrails. If you need more parallelism, add multiple agents, but keep state central to avoid communication overhead.

An example of the latter is OpenAI’s Symphony. This moves state into a task tracker then spawns⁵ individual codex agents with a fixed budget of iterations. Individual agents write back to the tracker to save state. This type of agent usage is also known as a “Ralph loop”: agents that start fresh for each iteration of the loop, with necessary context injected each time rather than accumulating organically in the context window.

Much like with Karpathy’s program.md you “program” the WORKFLOW.md with how you want the loop to run, then it executes autonomously.

Designing the workflow feels like a genuinely different skill. It’s not writing the code, the agent does that. It’s not specifying the solution either; in many cases the agent does that too! It’s about designing an approach: how can the agent make progress with each turn of the crank? How can the environment give clean validation signal to the agent about its approach? Not easy, and not quite what we used to do either.

AKA agent teams, swarms, or whichever Mad Max movie Yegge is on today. Google’s new “Towards a Science of Scaling Agent Systems” paper is not keen on multi-agent systems though: “on tasks requiring strict sequential reasoning […] every multi-agent variant we tested degraded performance by 39-70%”. ↩︎
Condolences if your executives are currently pushing that as company strategy A. ↩︎
Mostly: it did engage in a bit of seed hacking, so has achieved postgrad status successfully. ↩︎
Disclaimer: folks in my team worked on this! ↩︎
It also uses Elixir for coordination in the sample code, which brings warmth to my heart. Hello Joe. ↩︎

Programming The Loop

More posts