Tag: agentic

  • Darwin Gödel Machines

    https://open.substack.com/pub/gonzoml/p/darwin-godel-machine

    Interesting paper breakdown on Gonzo ML of another evolutionary agent approach from the extended Sakanaverse.

    It commences with an initial coding agent, constructed upon a frozen foundation model (FM) equipped with tool-use capabilities (e.g. running bash commands, editing files). In each cycle, “parent” agents are selected from the expanding archive. This selection process prioritizes agents based on a combination of their performance (assigning greater weight to higher scores, scaled by sigmoid) and a novelty bonus (inversely correlated with the number of offspring they have already produced, thereby encouraging exploration of less-frequented paths).

    The actual foundation model is a frozen component, so much like alphaevolve this is a search set up on top of the model intelligence. The search is evolving the agent code itself to try and do better on benchmarks.

    Qualitatively, the DGM learned to enhance its own tools and workflows. For instance, it developed more granular file editing capabilities (e.g., string replacement), improved long-context window management (e.g., auto-summarizing prior interactions), and refined its problem-solving strategies (e.g., making multiple attempts at a solution and using another FM to evaluate patches). These discovered improvements also demonstrated generalizability, transferring benefits across different underlying FMs and programming languages.

    When it comes to coding agents I had been thinking there were three axes of performance, which gate the overall effectiveness, but the paper makes it clear there are at least 4:

    1. The foundation model itself, with its base coding, tool use, reasoning abilities and context window size
    2. The tools it has available – the more the tool is exposes underlying semantics the more the model can efficiently use it.
    3. The UI, how the user interacts with the agent to direct it, provide clarity and review work.
    4. The prompt, strategies for problem solving and how the context window is managed (eg when to summarize)

    In this case the UI is held fixed (an outer eval loop), the model is fixed and the search explores tools and strategies. It seems at the very least a search across multiple different models as options might also work well!

  • Libraries not Frameworks & Training vs Agentic loops

    A couple of conversations I was in last week around agentic system design reminding me of Brandon Smith’s excellent write libraries, not frameworks post

    A library is a set of building blocks that may share a common theme or work well together, but are largely independent.

    A framework is a context in which someone writes their own code. This could take the form of inversion-of-control, a domain-specific language, or just a very opinionated and internally-coupled library. 

    […]

    So here’s my point: frameworks aren’t always bad, but they are a much bigger risk – for both the creators and the users – than libraries are. If your framework can be a library without losing much, it probably should be

    I feel like we have seen this in ML around general-purpose training loops. Everyone training a model needs a training loop, and there are a lot of commonalities (dataloading, checkpointing, observability and so on). It’s very tempting to build a general training loop that many different groups can use. Unfortunately, this is almost inevitably a framework, rather than a library, and inherently hard to compose.

    When the needs of the modeler exceed the bounds of the framework they either have to make extensive changes or drop the framework and move to a more bespoke set up. In practice this seems to result in a handful of training frameworks that are somewhat domain specialized: for example, a recsys training framework, an LLM training framework, a multimedia/vision oriented framework and so on.

    My sense is the same pattern will happen with agentic loops. A single “one-size-fits-all” agentic framework can feel too broad or rigid, and many teams will carve out domain-specific variants to get the features they truly need. Ideally, we will identify some truly generic components that can be build out library-style, and composed to the domains that we need.