Darwin Gödel Machines

Written by

in

https://open.substack.com/pub/gonzoml/p/darwin-godel-machine

Interesting paper breakdown on Gonzo ML of another evolutionary agent approach from the extended Sakanaverse.

It commences with an initial coding agent, constructed upon a frozen foundation model (FM) equipped with tool-use capabilities (e.g. running bash commands, editing files). In each cycle, “parent” agents are selected from the expanding archive. This selection process prioritizes agents based on a combination of their performance (assigning greater weight to higher scores, scaled by sigmoid) and a novelty bonus (inversely correlated with the number of offspring they have already produced, thereby encouraging exploration of less-frequented paths).

The actual foundation model is a frozen component, so much like alphaevolve this is a search set up on top of the model intelligence. The search is evolving the agent code itself to try and do better on benchmarks.

Qualitatively, the DGM learned to enhance its own tools and workflows. For instance, it developed more granular file editing capabilities (e.g., string replacement), improved long-context window management (e.g., auto-summarizing prior interactions), and refined its problem-solving strategies (e.g., making multiple attempts at a solution and using another FM to evaluate patches). These discovered improvements also demonstrated generalizability, transferring benefits across different underlying FMs and programming languages.

When it comes to coding agents I had been thinking there were three axes of performance, which gate the overall effectiveness, but the paper makes it clear there are at least 4:

The foundation model itself, with its base coding, tool use, reasoning abilities and context window size
The tools it has available – the more the tool is exposes underlying semantics the more the model can efficiently use it.
The UI, how the user interacts with the agent to direct it, provide clarity and review work.
The prompt, strategies for problem solving and how the context window is managed (eg when to summarize)

In this case the UI is held fixed (an outer eval loop), the model is fixed and the search explores tools and strategies. It seems at the very least a search across multiple different models as options might also work well!

More posts