Skip to content

About

On what neural net architectures actually make a difference

Written by

in

https://nonint.com/2024/03/03/learned-structures/

As time has passed, I’ve internally converged on the understanding that there are only a few types of architectural tweaks that actually meaningfully impact performance across model scales. These tweaks seem to fall into one of two categories: modifications that improve numerical stability during training, and modifications that enhance the expressiveness of a model in learnable ways.

I think this is a super interesting post, as the mental model it gives of architectures is helpful. The idea that the model learns “in stages” is very intuitive, and is easy to see in some architectures (like convnets!)

←TIL: How to measure memory usage from your PyTorchmodel without running it

More posts

The elusive order of things

May 25, 2026
Loss Exploded.

April 27, 2026
Unbundling Work

April 8, 2026
Native DSLs Ops in PyTorch

March 17, 2026

Twenty Twenty-Five

Designed with WordPress

Discover more from Ian’s Blog

Subscribe now to keep reading and get access to the full archive.

Type your email…

Continue reading