Scaling Workloads Across Multiple GPUs with CUDA C++
(NSWAMGC-OD)
Writing CUDA C++ applications that efficiently and correctly utilize all available GPUs on a node drastically improves performance over single-GPU code, and makes the most cost-effective use out of compute nodes with multiple GPUs. In this workshop you will learn to utilize multiple GPUs on a single node by:
- Learning how to launch kernels on multiple GPUs, each working on a subsection of the required work
- Learning how to use concurrent CUDA Streams to overlap memory copy with computation on multiple GPUs
Upon completion, you will be able to build robust and efficient CUDA C++ applications that can leverage all available GPUs on a single node."
Prerequisities
- Professional experience programming CUDA C/C++ applications, including the use of the nvcc compiler, kernel launches, grid-stride loops, host-to-device and device-to-host memory transfers, CUDA Streams, copy/compute overlap, and CUDA error handling.
- Familiarity with the Linux command line.
- Experience using Makefiles to compile C/C++ code
Suggested Resources to Satisfy Prerequisites
- Fundamentals of Accelerated Computing with CUDA C/C++.
- Accelerating CUDA C++ Applications with Concurrent Streams.
- Ubuntu Command Line for Beginners (sections 1 through 5).
- Makefile Tutorial (through Simple Examples).
Tools, Libraries, and Frameworks Used
- CUDA C++
- nvcc
- Nsight Systems