On The Factory Floor

March 20 2023

“On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models” dives into practical lessons learned by Google from building large-scale models to predict click-through rates on ads. This incredibly dense paper is packed with practical knowledge gained through real-world experience, and I highly recommended it for anyone working on large-scale ranking and recommendation systems.

One of the most interesting ideas in the paper is the suggestion to evaluate improvements based on gains in accuracy versus increased model complexity, and potential reductions in complexity while maintaining accuracy. Any improvement to the model accuracy could instead be reframed as a reduction in model size while holding the accuracy constant, helping keep training time from ballooning: looked at in this way some advances are bigger wins in efficiency than they are in accuracy.

Bottlenecking is another concept discussed in the paper. Making layers in the deep network wider is a consistently effective way of improving model accuracy, but comes with a significant cost in training time. By adding narrow bottleneck layers, the costs associated with making layers wider can be mitigated while only slightly impacting accuracy. The paper also delves into automating the addition of these layers using reinforcement learning, which can lead to significant savings in training time without sacrificing accuracy.

Data sampling is critical when dealing with massive amounts of data. The system trains over a time window, favoring more recent examples. Clicked examples are rarer, so they sample more of those and correct biases later. They also up-sample ads that are not often seen or clicked.

Loss optimization is another practical aspect of the system. Multi-objective optimization is used to optimize ranking loss (how correctly it orders ads) as well as regular click-through rate loss. The paper also mentions using distillation with a loss from a teacher model. However, training with all losses at once can lead to instability, so they start with log loss and gradually introduce other techniques.

Discover more from Ian’s Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading