learning track

GPU & CUDA

Get code onto the device and keep it there: GPU passthrough, CUDA runtimes in containers, kernels, and profiling.

0% 7 modules · ~4h

GPU primer reading

GPU work is about feeding thousands of small workers without starving memory bandwidth. Good CUDA code keeps data movement deliberate and makes parallel work regular enough for the device to schedule efficiently.

▸ The CPU launches work; the GPU runs many lightweight threads grouped into blocks.
▸ Copying data between host and device is expensive, so avoid unnecessary transfers.
▸ Profiling matters because slow kernels often look correct until occupancy and memory access are measured.

The device part 01

01 ▤ How GPUs execute your code reading 15m
02 ❯ CUDA in containers lab 40m

Writing for the GPU part 02

03 ❯ NumPy vectorization lab 25m
04 ❯ CUDA kernels lab 45m
05 ▤ Grids, blocks, and occupancy reading 20m

Keeping it fast part 03

06 ❯ PyTorch playground lab 35m
07 ⚑ Starved GPU challenge 45m