◂ tracks TRK-02

learning track

GPU & CUDA

Get code onto the device and keep it there: GPU passthrough, CUDA runtimes in containers, kernels, and profiling.

0% 7 modules · ~4h
GPU primer reading

GPU work is about feeding thousands of small workers without starving memory bandwidth. Good CUDA code keeps data movement deliberate and makes parallel work regular enough for the device to schedule efficiently.

  • The CPU launches work; the GPU runs many lightweight threads grouped into blocks.
  • Copying data between host and device is expensive, so avoid unnecessary transfers.
  • Profiling matters because slow kernels often look correct until occupancy and memory access are measured.
The device part 01
  • 01 How GPUs execute your code reading 15m
  • 02 CUDA in containers lab 40m
Writing for the GPU part 02
Keeping it fast part 03
KLOUD
guest @ /tracks/gpu-cuda