Fundamentals of Accelerated Computing with CUDA Python (NFACWCP-OD)

This course explores how to use Numba—the just-in-time, type-specializing Python function compiler—to accelerate Python programs to run on massively parallel NVIDIA GPUs. You’ll learn how to:

  • Use Numba to compile CUDA kernels from NumPy universal functions (ufuncs).
  • Use Numba to create and launch custom CUDA kernels.
  • Apply key GPU memory management techniques.

Upon completion, you’ll be able to use Numba to compile and launch CUDA kernels to accelerate your Python applications on NVIDIA GPUs.


  • Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations.
  • NumPy competency, including the use of ndarrays and ufuncs.
  • No previous knowledge of CUDA programming is required.

Suggested Resources to Satisfy Prerequisites

Tools, Libraries, and Frameworks Used

  • Numba