Deep Learning at Scale with Horovod (NDLSH-OD)

Modern deep learning challenges leverage increasingly larger datasets and more complex models. As a result, significant computational power is required to train models effectively and efficiently. In this course, you will learn how to scale deep learning training to multiple GPUs with Horovod, the open-source distributed training framework originally built by Uber. Over the course of 2 hours, you'll:

  • Complete a step-by-step refactor of a Fashion-MNIST classification model to use Horovod and run on four NVIDIA V100 GPUs
  • Understand Horovod's MPI roots and develop an intuition for parallel programming motifs like multiple workers, race conditions, and synchronization
  • Use techniques like learning rate warmup that greatly impact scaled deep learning performance

Upon completion, you'll be able to use Horovod to effectively scale deep learning training in new or existing code bases.


Prerequisites:

Competency in the Python programming language and experience training deep learning models in Python


Tools, libraries, frameworks used: Horovod, TensorFlow 2, Keras