On what neural net architectures actually make a difference

https://nonint.com/2024/03/03/learned-structures/

As time has passed, I’ve internally converged on the understanding that there are only a few types of architectural tweaks that actually meaningfully impact performance across model scales. These tweaks seem to fall into one of two categories: modifications that improve numerical stability during training, and modifications that enhance the expressiveness of a model in learnable ways.

I think this is a super interesting post, as the mental model it gives of architectures is helpful. The idea that the model learns “in stages” is very intuitive, and is easy to see in some architectures (like convnets!)

Discover more from Ian’s Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading