Improving Recommendation Systems with LLMs

Written by

https://eugeneyan.com/writing/recsys-llm/

Eugene Yan has put together a really extensive survey of recent research exploring the use of LLMs in recommendation systems.

Although early research in 2023—that applied LLMs to recommendations and search—often fell short, these recent efforts show more promise, especially since they’re backed by industry results. It suggests that there are tangible benefits from exploring the augmentation of recsys and search systems with LLMs, increasing performance while reducing cost and effort.

Recommendation systems are enormously important to a large swathes of tech business, primarily for e-commerce, content and advertisement targeting. Traditional deep recommenders typically use a two-tower architecture: one tower for users and another for items, independently encoding features into embeddings that can be scored together to retrieve and rank items. Features in each tower include both sparse (usually categorical, e.g., item categories, user histories) and dense (often continuous, e.g., age, price).

This design is popular because its effective and scalable: you can cache each tower’s embedding vectors and only pull in the ones you need for a given query (e.g. the batch of users you are getting recommendations for right now).

Despite the effectiveness and scalability of this approach, traditional systems often struggle with a set of known issues, such as cold-start problems—predicting relevant content for new items or users—and typically don’t consider interaction recency without additional engineering.

Yan categorizes recent research into four areas:

LLM/Multimodal Architectures:
- Directly embedding content understanding within the models. Content understanding has been used for a long time via separate models to generate additional metadata for content items to help both with cold start and accuracy.
- Generative approaches, which reframe recommendation as predicting future user actions based on interaction sequences.
LLM-Assisted Data Generation and Analysis:
- Improving content understanding and generating richer metadata for items.
Scaling Laws, Transfer, and Distillation:
- Adapting LLMs to meet latency requirements of recommendations, through smaller models and efficient inference techniques. RecSys, particularly models for advertising, tend to have very low latency requirements.
Unified Architectures for Search and Recommendations:
- Consolidating search and recommendation tasks into unified models that enable returning items based on interaction histories and/or user queries simultaneously.

There are a couple of common themes from reading the summaries:

Semantic Content Integration & Joint Tasks: Techniques like YouTube’s Semantic IDs and Kuaishou’s M3CSR generate content-based identifiers replacing traditional hashed IDs. The idea is to have inputs to the models represent the content in a way that carries meaning, rather than represent an identifier for the content.
Efficiency in Inference: Teacher-student distillation and efficient fine-tuning allow generating smaller, performant models for specific needs. For instance, Alibaba’s MLoRA trains a base model then LoRA fine-tunes for specific types of content, replacing a number of independently trained models.

These two combine somewhat to enable a trend towards more foundation- model-like training in RecSys that tackle a variety of user personalization tasks with a unified view of users, content, and user/content interactions.

recsys

Improving Recommendation Systems with LLMs

More posts

Power by the hour

Who is walking who?

MOPD

Benchmarks Mean Business

Discover more from Ian’s Blog