Optimizing CUDA Machine Learning Codes With Nsight Profiling Tools (NOCMLCWNPT-OD)

NVIDIA Developer Tools are a collection of applications, spanning desktop and mobile targets, which enable developers to build, debug, profile, and develop class-leading and cutting-edge software utilizing the latest visual computing hardware from NVIDIA. In this course you will learn the effective use of two powerful NVIDIA developer tools: Nsight Systems and Nsight Compute.

Nsight Systems provides developers a system-wide visualization of an application's performance. Developers can optimize bottlenecks to scale efficiently across any number or size of CPUs and GPUs; from large servers to the smallest systems on chip. Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool.

By the time you complete this course you will be able to use Nsight Systems and Nsight Compute to analyze and optimize CUDA applications. Following best practices, you will begin by using Nsight Systems to analyze overall application structure and explore parallelization opportunities before turing to Nsight Compute to analyze and optimize individual CUDA kernels.

Learning Objectives

By participating in this workshop you will be able to:

  • Use Nsight Systems to visualize and analyze a CUDA application's performance, identifying and addressing application bottlenecks
  • Use Nsight Compute to interactively profile and analyze individual CUDA kernels, optimizing them based on your findings
  • Combine the use of Nsight Systems and Nsight compute into an effective optimization workflow for many GPU-accelerated machine learning applications

Tools, Libraries, and Frameworks Used

  • NVIDIA Nsight Systems
  • NVIDIA Nsight Compute


  • Familiarity with machine learning applications using CUDA.