Mastering Data Analytics on AWS: From Data Lakes to Real-Time Insights (AMWSMDAS)

Learn to build operational data lakes, batch data analytics, and streaming solutions using AWS services like AWS Lake Formation, Amazon EMR, Amazon Redshift, Amazon Kinesis, and Amazon MSK. This course covers the entire data pipeline, including data collection, ingestion, cataloging, storage, processing, and analysis for both structured and unstructured data. Emphasis is on security, performance, and cost management.


Course Level: Intermediate

Duration: 4 days


Activities

Includes presentations, interactive demos, practice labs, discussions, and class exercises.

Course Objectives

In this course, you will learn to:

  • Compare data warehouses, data lakes, and modern data architectures.
  • Design and implement data lake, batch data analytics, and streaming data analytics solutions.
  • Secure data at rest and in transit.
  • Ingest, store, transform, query, analyze, and visualize data.
  • Monitor and manage analytics workloads.
  • Apply cost management best practices.

Intended Audience

This course is intended for:

  • Data engineers and architects
  • Developers
  • Data warehouse engineers
  • Data platform engineers
  • Solutions architects
  • IT professionals

Prerequisites

We recommend that attendees of this course have:

  • AWS Technical Essentials or Architecting on AWS
  • One year of experience in data analytics, real-time applications, or managing data warehouses
  • Completed Building Data Lakes on AWS
Show details

Building Data Lakes on AWS (ANBDLK)

Module A: Overview of Data Analytics and the Data Pipeline

  • Data analytics use cases
  • Using the data pipeline for analytics

Module 1: Introduction to Data Lakes

  • Value and components of data lakes
  • Common architectures

Module 2: Data Ingestion, Cataloging, and Preparation

  • Data lake storage and ingestion
  • AWS Glue crawlers and data catalog
  • Lab: Set up a simple data lake

Module 3: Data Processing and Analytics

  • Data processing with AWS Glue
  • Analyzing data with Amazon Athena

Module 4: Building a Data Lake with AWS Lake Formation

  • Features and benefits of AWS Lake Formation
  • Lab: Build a data lake

Module 5: Additional Lake Formation Configurations

  • Automate data lake creation
  • Visualize data with Amazon QuickSight
  • Lab: Data visualization using Amazon QuickSight

Module 6: Architecture and Course Review

  • Post course knowledge check
  • Architecture review

Building Batch Data Analytics Solutions on AWS (DABATC)

Module 1: Introduction to Amazon EMR

  • Using Amazon EMR in analytics solutions
  • EMR cluster architecture
  • Interactive Demo: Launching an EMR cluster

Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage

  • Storage optimization
  • Data ingestion techniques

Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR

  • Apache Spark use cases
  • Spark concepts
  • Interactive Demo: Connect to an EMR cluster
  • Practice Lab: Low-latency data analytics using Apache Spark

Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive

  • Using Hive to process batch data
  • Practice Lab: Batch data processing with Hive

Module 5: Serverless Data Processing

  • Serverless data processing with AWS Glue
  • Practice Lab: Orchestrate data processing with AWS Step Functions

Module 6: Security and Monitoring of Amazon EMR Clusters

  • Securing EMR clusters
  • Interactive Demo: Client-side encryption with EMRFS
  • Monitoring and troubleshooting

Module 7: Designing Batch Data Analytics Solutions

  • Batch data analytics use cases
  • Activity: Designing a batch data analytics workflow

Module B: Developing Modern Data Architectures on AWS

  • Modern data architectures


Building Data Analytics Solutions Using Amazon Redshift (DAREDS)

Module 1: Using Amazon Redshift in the Data Analytics Pipeline

  • Overview of Amazon Redshift
  • Setting up your data warehouse

Module 2: Introduction to Amazon Redshift

  • Architecture and features
  • Interactive Demos and Practice Labs

Module 3: Ingestion and Storage

  • Techniques and demos
  • Practice Labs

Module 4: Processing and Optimizing Data

  • Transformation and querying
  • Resource management

Module 5: Security and Monitoring of Amazon Redshift Clusters

  • Securing and monitoring clusters

Module 6: Designing Data Warehouse Analytics Solutions

  • Use case review and workflow design


Building Streaming Data Analytics Solutions on AWS (DASTRM)

Module 1: Using Streaming Services in the Data Analytics Pipeline

  • Importance and concepts of streaming data analytics

Module 2: Introduction to AWS Streaming Services

  • Kinesis and MSK
  • Demos and Practice Labs

Module 3: Using Amazon Kinesis for Real-time Data Analytics

  • Workloads and streams
  • Demos and Practice Labs

Module 4: Securing, Monitoring, and Optimizing Amazon Kinesis

  • Best practices

Module 5: Using Amazon MSK in Streaming Data Analytics Solutions

  • Use cases and clusters
  • Demos and Practice Labs

Module 6: Securing, Monitoring, and Optimizing Amazon MSK

  • Best practices

Module 7: Designing Streaming Data Analytics Solutions

  • Use case review and workflow design

This course equips you with the skills to build and manage data lakes, batch data analytics, and streaming data analytics solutions on AWS efficiently