Implement a data engineering solution with Azure Databricks
(DP-3027)
Coming
Soon
Learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.
Audience Profile
Data Engineers, Data Scientists, ELT Developers learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.
Course Modules
Perform incremental processing with spark structured streaming
You explore different features and tools to help you understand and work with incremental processing with spark structured streaming.
- Introduction
- Set up real-time data sources for incremental processing
- Optimize Delta Lake for incremental processing in Azure Databricks
- Handle late data and out-of-order events in incremental processing
- Monitoring and performance tuning strategies for incremental processing in Azure Databricks
- Exercise - Real-time ingestion and processing with Delta Live Tables with Azure Databricks
- Module assessment
- Summary
Implement streaming architecture patterns with Delta Live Tables
You explore different features and tools to help you develop architecture patterns with Azure Databricks Delta Live Tables.
- Introduction
- Event driven architectures with Delta Live tables
- Ingest data with structured streaming
- Maintain data consistency and reliability with structured streaming
- Scale streaming workloads with Delta Live tables
- Exercise - end-to-end streaming pipeline with Delta Live tables
- Module assessment
- Summary
Optimize performance with Spark and Delta Live Tables
Learn how to Optimize performance with Spark and Delta Live Tables in Azure Databricks.
- Introduction
- Optimize performance with Spark and Delta Live Tables
- Perform cost-based optimization and query tuning
- Use change data capture (CDC)
- Use enhanced autoscaling
- Implement observability and data quality metrics
- Exercise - optimize data pipelines for better performance in Azure Databricks
- Module assessment
- Summary
Implement CI/CD workflows in Azure Databricks
Learn how to implement CI/CD workflows in Azure Databricks to automate the integration and delivery of code changes.
- Introduction
- Implement version control and Git integration
- Perform unit testing and integration testing
- Manage and configure your environment
- Implement rollback and roll-forward strategies
- Exercise - Implement CI/CD workflows
- Module assessment
- Summary
Automate workloads with Azure Databricks Jobs
Learn how to orchestrate and schedule data workflows with Azure Databricks Jobs. Define and monitor complex pipelines, integrate with tools like Azure Data Factory and Azure DevOps, and reduce manual intervention, leading to improved efficiency, faster insights, and adaptability to business needs.
- Introduction
- Implement job scheduling and automation
- Optimize workflows with parameters
- Handle dependency management
- Implement error handling and retry mechanisms
- Explore best practices and guidelines
- Exercise - Automate data ingestion and processing
- Module assessment
- Summary
Manage data privacy and governance with Azure Databricks
In this module, you explore different features and approaches to help you secure and manage your data within Azure Databricks using tools, such as Unity Catalog.
- Introduction
- Implement data encryption techniques in Azure Databricks
- Manage access controls in Azure Databricks
- Implement data masking and anonymization in Azure Databricks
- Use compliance frameworks and secure data sharing in Azure Databricks
- Use data lineage and metadata management
- Implement governance automation in Azure Databricks
- Exercise - Practice the implementation of Unity Catalog
- Module assessment
- Summary
Use SQL Warehouses in Azure Databricks
Azure Databricks provides SQL Warehouses that enable data analysts to work with data using familiar relational SQL queries.
- Introduction
- Get started with SQL Warehouses
- Create databases and tables
- Create queries and dashboards
- Exercise - Use a SQL Warehouse in Azure Databricks
- Module assessment
- Summary
Run Azure Databricks Notebooks with Azure Data Factory
Using pipelines in Azure Data Factory to run notebooks in Azure Databricks enables you to automate data engineering processes at cloud scale.
- Introduction
- Understand Azure Databricks notebooks and pipelines
- Create a linked service for Azure Databricks
- Use a Notebook activity in a pipeline
- Use parameters in a notebook
- Exercise - Run an Azure Databricks Notebook with Azure Data Factory
- Module assessment
- Summary