CUDA graphs in PyTorch

The kernel dispatch time eats a lot of performance on GPU – CUDA graphs let you chain a bunch of kernels together, and they’re now more accessible from PyTorch:

CUDA Graphs, which made its debut in CUDA 10, let a series of CUDA kernels to be defined and encapsulated as a single unit, i.e., a graph of operations, rather than a sequence of individually-launched operations. It provides a mechanism to launch multiple GPU operations through a single CPU operation, and hence reduces the launching overheads.

https://pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/

Discover more from Ian’s Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading