A Primer on Post-Training

Written by

in

A Primer on LLM Post-Training – PyTorch

Very excited to see this publicly available. David moved to the PyTorch team at the start of the year, having worked on Llama, and wrote up this excellent guide for post-training internally. This is a cleaned up version of the same doc, and provides a fantastic introduction to the world of post-training for modern LLMs.

It also includes one of my favorite perverse incentive examples:

Note: this happens with humans too! We just call these Perverse Incentives, but they are literally the same thing. The British government, concerned about the number of venomous cobras in Delhi, offered a bounty for every dead cobra. Initially, this was a successful strategy; large numbers of snakes were killed for the reward. Eventually, however, people began to breed cobras for income.

The real kicker in that one came when the government realized what was happening and canceled the bounty. The folks who had been breeding cobras didn’t want to look after them any more, so just released them, leading to a lot more cobras than there had been before!

More posts