PyTorch Conference 2025

Written by

The schedule is up for the 2025 edition of the PyTorch conference, which is now at the Moscone West in San Francisco.

There are a lot of great sessions, but I’ll highlight some I personally find particularly interesting:

Post-Training: Clearly a big theme this year, with some interesting talks from multiple groups:

PyTorch Conference 2025: Verl: A Flexible and Efficient RL Framew… – The Bytedance seed team are doing some great work, and Verl seems like a strong post-training option
PyTorch Conference 2025: Post-training at Scale in Native PyTorch… – Evan introduces the new torch package to replace Torchtune, designed for (scaled) RL post-training
PyTorch Conference 2025: Maximizing Luck in Reinforcement Learnin… – Daniel from Unsloth always delivers a great talk
PyTorch Conference 2025: An Open Source Post-Training Stack: Kube… – Anyscale’s ideas on a post-training stack
PyTorch Conference 2025: Lifecycle of a Parameter – Philip Bontra… – Philip works on Torchtune and the new solution mentioned above, this is a “life of a parameter” lightning talk through various transformations.

General Training

PyTorch Conference 2025: Monarch: A Distributed Execution Engine… – I’m pretty convinced single controller is the direction we all will go in, and Monarch is a great place to start.
PyTorch Conference 2025: Efficient MoE Pre-training at Scale on A… – A walk through of training on AMD at a decent level of scale
PyTorch Conference 2025: MX Training To Inference on B200 Using T… – MX number formats allow packing even more parameters into a few bits. Blackwell has support, so it’s interesting to see how well it performs.
PyTorch Conference 2025: PyTorch APIs for High Performance MoE Tr… – Some of the best names in PyTorch performance tackling MoE training.

Kernel development

PyTorch Conference 2025: The Future Is Tiled: Using CuTile and Ti… – Cuda is thread-based, but a lot of the successful kernel development languages (like Triton!) are block or tile based. TileIR is a low level representation of that kind of tile based programming which Nvidia teased back in March.
PyTorch Conference 2025: PyTorch Symmetric Memory: A New Programm… – Symmetric memory allows kicking off comms without going back to the host, so you can fuse more stuff!
PyTorch Conference 2025: Mojo + PyTorch: A Simpler, Faster Path T… – Mojo kind of fell off my radar, but using it as a kernel language is a super interesting idea!
PyTorch Conference 2025: Helion: A High-level DSL for Kernel Auth… – My general take on kernel development is that we are expanding the stack both up and down from Triton to try and give the best range development vs performance tradeoffs. Helion is a super interesting approach at going up: bridging a gap between PyTorch and Triton.

Compilers

PyTorch Conference 2025: Lightning Talk: Dynamic Shape Recompilat… – Bob Ren fills in for Ed Yang’s classics with a talk about recompilations and dynamic shapes.
PyTorch Conference 2025: Thunder: Distribute and Optimize Your Py… – A talk on Lightning AIs PyTorch compiler

Inference

PyTorch Conference 2025: vLLM: Easy, Fast, and Cheap LLM Serving… – VLLM is the most important project in inference right now, so this should be interesting!
PyTorch Conference 2025: Everything Everywhere All at Once: Hardw… – Shockingly the Google talk includes TPUs, but I think this trend (heterogenous serving) is pretty broadly applicable as we try and optimize our wattage!
PyTorch Conference 2025: ExecuTorch 1.0: General Availability Sta… – Edge is inference too, and I’m glad to see ExecuTorch make it to GA

I’m looking forward to October!

More posts