Mastering Data Analytics on AWS: From Data Lakes to Real-Time Insights
(AMWSMDAS)
Learn to build operational data lakes, batch data analytics, and streaming solutions using AWS services like AWS Lake Formation, Amazon EMR, Amazon Redshift, Amazon Kinesis, and Amazon MSK. This course covers the entire data pipeline, including data collection, ingestion, cataloging, storage, processing, and analysis for both structured and unstructured data. Emphasis is on security, performance, and cost management.
Course Level: Intermediate
Duration: 4 days
Activities
Includes presentations, interactive demos, practice labs, discussions, and class exercises.
Course Objectives
In this course, you will learn to:
- Compare data warehouses, data lakes, and modern data architectures.
- Design and implement data lake, batch data analytics, and streaming data analytics solutions.
- Secure data at rest and in transit.
- Ingest, store, transform, query, analyze, and visualize data.
- Monitor and manage analytics workloads.
- Apply cost management best practices.
Intended Audience
This course is intended for:
- Data engineers and architects
- Developers
- Data warehouse engineers
- Data platform engineers
- Solutions architects
- IT professionals
Prerequisites
We recommend that attendees of this course have:
- AWS Technical Essentials or Architecting on AWS
- One year of experience in data analytics, real-time applications, or managing data warehouses
- Completed Building Data Lakes on AWS
Building Data Lakes on AWS (ANBDLK)
Module A: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
- Using the data pipeline for analytics
Module 1: Introduction to Data Lakes
- Value and components of data lakes
- Common architectures
Module 2: Data Ingestion, Cataloging, and Preparation
- Data lake storage and ingestion
- AWS Glue crawlers and data catalog
- Lab: Set up a simple data lake
Module 3: Data Processing and Analytics
- Data processing with AWS Glue
- Analyzing data with Amazon Athena
Module 4: Building a Data Lake with AWS Lake Formation
- Features and benefits of AWS Lake Formation
- Lab: Build a data lake
Module 5: Additional Lake Formation Configurations
- Automate data lake creation
- Visualize data with Amazon QuickSight
- Lab: Data visualization using Amazon QuickSight
Module 6: Architecture and Course Review
- Post course knowledge check
- Architecture review
Building Batch Data Analytics Solutions on AWS (DABATC)
Module 1: Introduction to Amazon EMR
- Using Amazon EMR in analytics solutions
- EMR cluster architecture
- Interactive Demo: Launching an EMR cluster
Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage
- Storage optimization
- Data ingestion techniques
Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR
- Apache Spark use cases
- Spark concepts
- Interactive Demo: Connect to an EMR cluster
- Practice Lab: Low-latency data analytics using Apache Spark
Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive
- Using Hive to process batch data
- Practice Lab: Batch data processing with Hive
Module 5: Serverless Data Processing
- Serverless data processing with AWS Glue
- Practice Lab: Orchestrate data processing with AWS Step Functions
Module 6: Security and Monitoring of Amazon EMR Clusters
- Securing EMR clusters
- Interactive Demo: Client-side encryption with EMRFS
- Monitoring and troubleshooting
Module 7: Designing Batch Data Analytics Solutions
- Batch data analytics use cases
- Activity: Designing a batch data analytics workflow
Module B: Developing Modern Data Architectures on AWS
- Modern data architectures
Building Data Analytics Solutions Using Amazon Redshift (DAREDS)
Module 1: Using Amazon Redshift in the Data Analytics Pipeline
- Overview of Amazon Redshift
- Setting up your data warehouse
Module 2: Introduction to Amazon Redshift
- Architecture and features
- Interactive Demos and Practice Labs
Module 3: Ingestion and Storage
- Techniques and demos
- Practice Labs
Module 4: Processing and Optimizing Data
- Transformation and querying
- Resource management
Module 5: Security and Monitoring of Amazon Redshift Clusters
- Securing and monitoring clusters
Module 6: Designing Data Warehouse Analytics Solutions
- Use case review and workflow design
Building Streaming Data Analytics Solutions on AWS (DASTRM)
Module 1: Using Streaming Services in the Data Analytics Pipeline
- Importance and concepts of streaming data analytics
Module 2: Introduction to AWS Streaming Services
- Kinesis and MSK
- Demos and Practice Labs
Module 3: Using Amazon Kinesis for Real-time Data Analytics
- Workloads and streams
- Demos and Practice Labs
Module 4: Securing, Monitoring, and Optimizing Amazon Kinesis
- Best practices
Module 5: Using Amazon MSK in Streaming Data Analytics Solutions
- Use cases and clusters
- Demos and Practice Labs
Module 6: Securing, Monitoring, and Optimizing Amazon MSK
- Best practices
Module 7: Designing Streaming Data Analytics Solutions
- Use case review and workflow design
This course equips you with the skills to build and manage data lakes, batch data analytics, and streaming data analytics solutions on AWS efficiently